National Repository of Grey Literature 2 records found  Search took 0.01 seconds. 
Compression of biological sequences
Šurín, Tomáš ; Mráz, František (advisor) ; Dvořák, Tomáš (referee)
Volumes of data obtained from the next generation sequencing platforms is growing faster than the available capacity of storage media. Sequencers mainly produce short reads of DNA. However, output of the sequencing machines also contains other information, for example information about read reliability/quality. This data must be archived even after successful complete genome assembly. Standard file format used for this type of data is format SAM (Sequence Alignment/Mapping Format) and its binary compressed version BAM. In this thesis we describe the construction of a better lossless compression scheme for compression of files in the SAM/BAM format. This compression scheme provides better compression ratios than the BAM format. In addition, random access to data in the compressed file is retained. Implementation of this compression scheme is platform independent and allows simple configuration of the compression process. Implementation also offers easy extensibility. Thanks to this, we will be able to respond to changes in current sequencing platforms as well as to changes in the SAM format.
Compression of biological sequences
Šurín, Tomáš ; Mráz, František (advisor) ; Dvořák, Tomáš (referee)
Volumes of data obtained from the next generation sequencing platforms is growing faster than the available capacity of storage media. Sequencers mainly produce short reads of DNA. However, output of the sequencing machines also contains other information, for example information about read reliability/quality. This data must be archived even after successful complete genome assembly. Standard file format used for this type of data is format SAM (Sequence Alignment/Mapping Format) and its binary compressed version BAM. In this thesis we describe the construction of a better lossless compression scheme for compression of files in the SAM/BAM format. This compression scheme provides better compression ratios than the BAM format. In addition, random access to data in the compressed file is retained. Implementation of this compression scheme is platform independent and allows simple configuration of the compression process. Implementation also offers easy extensibility. Thanks to this, we will be able to respond to changes in current sequencing platforms as well as to changes in the SAM format.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.